DEC 0 1 



ORlGimL 



HEWLETT-PACKARD COMPANY 
Intellectual Property Administration 
P.O. Box 272400 

Fort Collins, Colorado 80527-2400 

IN THE 

UNITED STATES PATENT AND TRADEMARK OFFICE 



PATENT APPLICATION 
ATTORNEY DOCKET NO. 200301 732-1 



Inventor(s): Steven O. HOBBS et al. 
Application No.: 09/785,143 
Filing Date: 02/16/2001 



Confirmation No.: 4992 
Examiner: A. R. Fowlkes 
Group Art Unit: 2192 



Title: METHOD AND APPARATUS FOR REDUCING CACHE THRASHING 



Mail Stop Appeal Brief-Patents 
Commissioner For Patents 
PO Box 1450 

Alexandria, VA 22313-1450 



TRANSMITTAL OF APPEAL BRIEF 



Transmitted herewith is the Appeal Brief in this application with respect to the Notice of Appeal filed on 09/29/2005 
The fee for filing this Appeal Brief is (37 GFR 1.17(c)) $500.00. 

(complete (a) or (b) as applicable) 
The proceedings herein are for a patent application and the provisions of 37 CFR 1.136(a) apply. 

□ (a) Applicant petitions for an extension of time under 37 CFR 1.136 (fees: 37 CFR 1.17(a)-(d)) for the total number of 
months checked below: 



, — I 1st Month 
I— I $120 



□ 



2nd Month 
$450 



□ 



3rd Month 
$1020 



□ 



4th Month 
$1590 



n The extension fee has already been filed in this application. 

[x](b) Applicant believes that no extension of time is required. However, this conditional petition is being made to provide for 
the possibility that applicant has inadvertently overlooked the need for a petition and fee for extension of time. 

Please charge to Deposit Account 08-2025 the sum of $ 500 At any time during the pendency of this application, 
please charge any fees required or credit any over payment to Deposit Account 08-2025 pursuant to 37 CFR 1.25. 
Additionally please charge any fees to Deposit Account 08-2025 under 37 CFR 1.16 through 1.21 inclusive, and any other 
sections in Title 37 of the Code of Federal Regulations that may regulate fees. A duplicate copy of this sheet is enclosed. 



[x] I hereby certify that this correspondence is being 
deposited with the United States Postal Service as first 
class mail in an envelope addressed to: 
Commissioner for Patents, Alexandria, VA 22313-1450 

Date of Deposit: 11/28/2005 
OR 

I I I hereby certify that this paper is being transmitted to 
the Patent and Trademark Office facsimile number 
(571)273-8300. 



Respectfully submitted 
Steven O. HOBBS et al 



Date of facsimile: 
Typed Name: 
Signature 



Colleen F. Brown 



Alan D. Christenson 
Attorney/Agent for Applicant(s) 

Reg No. : 54.036 
Date: 11/28/2005 
Telephone : (713) 238-8000 



Rev 10/05 (AplBrief) 




IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 



Appellants: 


Steven O. HOBBS et al. 


§ 


Confirmation No.: 4992 


Serial No.: 


09/785,143 


§ 
§ 


Group Art Unit: 2192 


Filed: 


02/16/2001 


§ 
§ 


Examiner: Andre R. Fowlkes 


For: 


Method And Apparatus 


§ 
§ 


Docket No.: 200301732-1 




For Reducing Cache 


§ 






Thrashing 


§ 





APPEAL BRIEF 



Mail Stop Appeal Brief - Patents 

Commissioner for Patents 
PO Box 1450 

Alexandria, VA 22313-1450 
Sir: 



Date: November 28, 2005 



Appellants hereby submit this Appeal Brief in connection with the above- 
identified application. A Notice of Appeal was filed via facsimile on September 29, 
2005. 



iE/02/2005 DEHRftHUi 00000065 082025 09755143 
01 FC:1402 500.00 DA 



59991.01/1662.48400 



Page 1 of 50 



HP PDNO 200301732-1 



Appl. No. 09/785,143 

Appeal Brief dated November 28, 2005 

Reply to final Office action of July 27, 2005 



TABLE OF CONTENTS 

I. REAL PARTY IN INTEREST 3 

II. RELATED APPEALS AND INTERFERENCES 4 

III. STATUS OF THE CLAIMS 5 

IV. STATUS OF THE AMENDMENTS 6 

V. SUMMARY OF THE CLAIMED SUBJECT MATTER 7 

VI. GROUNDS OF REJECTION TO BE REVIEWED ON APPEAL 18 

VII. ARGUMENT 19 

A. Overview of Carr 19 

B. Overview of McGehearty 20 

C. Did amendments to claims 1, 9, 17, 24, 26 (new claim 41), 

33 and 35 introduce new matter? 21 

D. Claims 1-8, with claim 1 representing this group 21 

E. Claims 9-16, with claim 9 representing this group 23 

F. Claims 17-23, with claim 17 representing this group 24 

G. Claims 24, 35-37 and 39-40, with claim 35 representing this 
group 25 

H. Claims 33 and 34, with claim 33 representing this group 26 

I. Claims 41-47, with claim 41 representing this group 27 

VIII. CONCLUSION 28 

IX. CLAIMS APPENDIX 29 

X. EVIDENCE APPENDIX 49 

XI. RELATED PROCEEDINGS APPENDIX 50 



59991.01/1662.48400 



Page 2 of 50 



HPPDNO 200301732-1 



AppI- No. 09/785,143 

Appeal Brief dated November 28, 2005 

Reply to final Office action of July 27, 2005 

I. REAL PARTY IN INTEREST 

The real party in interest is the Hewlett-Packard Development Company 
(HPDC), a Texas Limited Partnership, having its principal place of business in 
Houston, Texas. HPDC is a wholly owned affiliate of Hewlett-Packard Company 
(HPC). HPC merged with Compaq Computer Corporation (CCC) which owned 
Compaq Information Technologies Group, LP. (CITG). The Assignment from the 
inventors to CCC was recorded on February 16, 2001, at Reel/Frame 
011614/0531. The Assignment from CCC to CITG was recorded on June 7, 
2005, at Reel/Frame 016306/0921. The Change of Name document was 
recorded on June 8, 2005, at Reel/Frame 016313/0854. 
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II. RELATED APPEALS AND INTERFERENCES 

Appellants are unaware of any related appeals or interferences. 
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III. STATUS OF THE CLAIMS 

In the final Office action dated July 27, 2005, the Exanniner rejected claims 1-40. 
Appellants submitted a response to the final Office action on September 27, 2005, in 
which Appellants unintentionally canceled claims 26-32. On November 17, 2005, 
Appellants submitted an Amendment after Notice of Appeal to cancel claims 25 and 38 
and to insert the content of the previously canceled claims 26-32 into new claims 41-47. 
Appellants are appealing the rejection of claims 1-24 and 33-47. 
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IV. STATUS OF THE AMENDMENTS 

A Response to Final Office Action dated July 27, 2005, was filed with the 
U.S. Patent and Trademark Office on September 27, 2005, In the Response, 
claims 26-32 were unintentionally canceled. On November 17, 2005, Appellants 
submitted an Amendment after Notice of Appeal to cancel claims 25 and 38 and to 
insert the content of the previously canceled claims 26-32 into new claims 41-47. 
The amendment was not a substantive amendment, does not require further 
searching, and merely puts the remaining claims in better condition for appeal. 
Appellants assume these amendments have been or will be entered. The Claims 
Appendix includes two sets of claims: 1) a copy of the claims to be considered 
for appeal (claims 1-24 and 33-47); and 2) a copy of the claims rejected by the 
Examiner in the final Office action dated July 27, 2005 (claims 1-40). 
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V, SUMMARY OF THE CLAIMED SUBJECT MATTER 

The following provides a concise explanation of the subject matter 
defined in each of the claims involved in the appeal, referring to the specification 
by page and line number and to the drawings by reference characters, as 
required by 37 C.F.R. § 41.37(c)(1)(v). Each element of the claims is identified 
by a corresponding reference to the specification and drawings where 
applicable (i.e., the claims are annotated using parenthesis). Note that the 
citation to passages in the specification and drawings for each claim element 
does not imply that the limitations from the specification and drawings should be 
read into the corresponding claim element. 

1. (Previously presented) A method, comprising: 

identifying a loop in a program (see Fig. 4 and p. 13, lines 23-25); 
identifying each vector memory reference in the loop (see Fig. 4; p. 13, 
lines 5-7; and p. 13, lines 16-19); 

determining dependencies between vector memory references in the loop, 
including determining unidirectional and circular dependencies (see Fig. 4 and p. 
13, line 19 -p. 14, line 2); 

distributing the vector memory references into a plurality of detail loops 
configured to allocate the vector memory references into a plurality 
of temporary arrays, sized and located, so that none of the vector 
memory references are cache synonyms, wherein the vector 
memory references that have circular dependencies therebetween 
are included in a common detail loop, and wherein the detail loops 
are ordered according to the unidirectional dependencies between 
the memory references (see Fig 4 and p. 14, line 21 - p. 15, line 
24); 

analyzing an execution profile of the program after said distributing (see 

Fig. 2 and p. 8, line 1 1 - p. 9, line 7); and 
based on the execution profile, determining whether to repeat said 

identifying a loop, said identifying each vector memory reference, 
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said determining dependencies, and said distributing (see Fig. 2 
and p. 8, line 11 - p. 9, line 7). 

2. (Original) A method, as set forth in claim 1, further comprising allocating a 
plurality of temporary storage areas within a cache and determining the size of 
each temporary storage area based on the size of the cache and the number of 
temporary storage areas (p. 15, lines 12-24). 

3. (Original) A method, as set forth in claim 1, further comprising at least one 
section loop including the plurality of detail loops (p. 15, lines 4-5). 

4. (Original) A method, as set forth in claim 1, wherein distributing the vector 
memory references into a plurality of detail loops further comprises distributing 
the vector memory references into a plurality of detail loops that each contain at 
least one vector memory reference that could benefit from cache management 
(see 320 of Fig. 4 and p. 14, lines 4-14). 

5. (Original) A method, as set forth in claim 1, further comprising inserting 
cache management instructions into at least one of said detail loops to control 
movement of data associated with the vector memory reference between a cache 
and main memory (p. 18, lines 6-9 and p. 20, lines 27-37). 

6. (Original) A method, as set forth in claim 1, further comprising inserting 
prefetch instructions into at least one of said detail loops to control movement of 
data associated with the vector memory reference between a cache and main 
memory (p. 18, lines 9-25). 

7. (Original) A method, as set forth in claim 1, further comprising performing 
loop unrolling on at least one of said detail loops to control movement of data 
associated with the vector memory reference between a cache and main memory 
(p. 19, line 1 - p. 20, line 37). 
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8. (Original) A method, as set forth in claim 1, further comprising inserting at 
least one of a prefetch instruction and a cache management instruction into at 
least one of said detail loops to control movement of data associated with the 
vector memory reference between a cache and main memory, and performing 
loop unrolling on at least one of said detail loops to control movement of data 
associated with the vector memory reference between a cache and main memory 
(p, 18, line 7 - p. 20, line 37). 

9. (Previously presented) A method, comprising: 

identifying a loop in a program (see Fig. 4 and p. 13, lines 23-25); 

identifying each vector memory reference in the loop (see Fig. 4; p. 13, 
lines 5-7; and p. 13, lines 16-19); 

determining dependencies between vector memory references in the loop 
(see Fig. 4 and p. 13, line 19 - p. 14, line 2); and 

distributing the vector memory references into a plurality of detail loops 
that serially proceed through strips of the vector memory references 
and store the strips in temporary arrays so that none of the vector 
memory references are cache synonyms, wherein the vector 
memory references that have dependencies therebetween are 
included in a common detail loop (see Fig 4 and p. 14, line 21 - p. 
15, line 24); 

wherein said distributing the vector memory references into a plurality of 
detail loops is performed by a first computer for execution by a 
second computer (p. 7, lines 19-22). 

10. (Original) A method, as set forth in claim 9, further comprising allocating a 
plurality of temporary storage areas within a cache and determining the size of 
each temporary storage area based on the size of the cache and the number of 
temporary storage areas (p. 15, lines 12-24). 
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1 1 . (Original) A method, as set forth in claim 9, further comprising at least one 
section loop including the plurality of detail loops (p. 15, lines 4-5). 

12. (Original) A method, as set forth in claim 9, wherein distributing the vector 
memory references into a plurality of detail loops further comprises distributing 
the vector memory references into a plurality of detail loops that each contain at 
least one vector memory reference that could benefit from cache management 
(see 320 of Fig. 4 and p. 14, lines 4-14). 

13. (Original) A method, as set forth in claim 9, further comprising inserting 
cache management instructions into at least one of said detail loops to control 
movement of data associated with the vector memory reference between a cache 
and main memory (p. 18, lines 6-9 and p. 20, lines 27-37). 

14. (Original) A method, as set forth in claim 9, further comprising inserting 
prefetch instructions into at least one of said detail loops to control movement of 
data associated with the vector memory reference between a cache and main 
memory (p. 18, lines 9-25). 

15. (Original) A method, as set forth in claim 9, further comprising performing 
loop unrolling on at least one of said detail loops to control movement of data 
associated with the vector memory reference between a cache and main memory 
(p. 19, line 1 - p. 20, line 37). 

16. (Original) A method, as set forth in claim 9, further comprising inserting at 
least one of a prefetch instruction and a cache management instruction into at 
least one of said detail loops to control movement of data associated with the 
vector memory reference between a cache and main memory, and performing 
loop unrolling on at least one of said detail loops to control movement of data 
associated with the vector memory reference between a cache and main memory 
(p. 18, line 7 - p. 20, line 37). 
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17. (Previously presented) A method, comprising: 

identifying a loop in a program (see Fig. 4 and p. 13, lines 23-25); 

identifying each vector memory reference in the loop (see Fig. 4; p. 13, 
lines 5-7; and p. 13, lines 16-19); 

determining dependencies between vector memory references in the loop 
(see Fig. 4 and p. 13, line 19 - p. 14, line 2); 

distributing the vector memory references into a plurality of detail loops in 
response to cache behavior and the dependencies between the 
vector memory references in the loop, wherein the detail loops 
cause storage of the vector memory references in temporary arrays 
that are allocated consecutively so that no temporary arrays 
elements are cache synonyms (see Fig 4 and p. 14, line 21 - p. 15, 
line 24), 

wherein said identifying a loop, said identifying each vector memory 
reference, said determining dependencies between vector memory 
references and said distributing the vector memory references into 
a plurality of detail loops produce code that is substantially 
independent of a computer architecture (p. 8, lines 1-7); and 

performing code optimizations that are dependent on a computer 
architecture after said distributing (p. 8, lines 18-22). 

18. (Original) A method, as set forth in claim 17, wherein distributing the 
vector memory references further comprises distributing the vector memory 
references into the plurality of detail loops with each loop having at least one of 
the identified vector memory references (see Fig 4 and p. 14, line 21 - p. 15, line 
24). 

19. (Original) A method, as set forth in claim 17, further comprising 
determining dependencies between vector memory references in the loop, and 
wherein distributing the loop includes distributing the vector memory references 
into the plurality of detail loops, wherein the vector memory references that have 
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circular dependencies therebetween are included in a common detail loop (p. 13, 
lines 16-21). 

20. (Original) A method, as set forth in claim 17, further comprising inserting 
cache management instructions into at least one of said detail loops to control 
movement of data associated with the vector memory reference between a cache 
and main memory (p. 18, lines 6-9 and p. 20, lines 27-37). 

21. (Original) A method, as set forth in claim 17, further comprising inserting 
prefetch instructions into at least one of said detail loops to control movement of 
data associated with the vector memory reference between a cache and main 
memory (p. 18, lines 9-25). 

22. (Original) A method, as set forth in claim 17, further comprising performing 
loop unrolling on at least one of said detail loops to control movement of data 
associated with the vector memory reference between a cache and main memory 
(p. 19, line 1 - p. 20, line 37). 

23. (Original) A method, as set forth in claim 17, further comprising inserting 
at least one of a prefetch instruction and a cache management instruction into at 
least one of said detail loops to control movement of data associated with the 
vector memory reference between a cache and main memory, and performing 
loop unrolling on at least one of said detail loops to control movement of data 
associated with the vector memory reference between a cache and main memory 
(p. 18, line 7 -p. 20, line 37). 

24. (Previously presented) A computer programmed to perform a method, 
comprising: 

identifying a loop in a program (see Fig. 4 and p. 13, lines 23-25); 
identifying each vector memory reference in the loop (see Fig. 4; p. 13, 
lines 5-7; and p. 13, lines 16-19); 
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determining dependencies between vector memory references in the loop 
(see Fig. 4 and p. 1 3, line 1 9 - p. 14, line 2); and 

distributing the vector memory references into a plurality of detail loops 
configured to retrieve strips of the vector memory references and 
store the strips in temporary arrays, wherein the vector memory 
references that have circular dependencies therebetween are 
included in a common detail loop (see Fig 4 and p. 14, line 21 - p. 
15, line 24), 

wherein the temporary arrays are configured to simultaneously fit in a 
single cache bank (p. 15, lines 16-20 and p. 17, line 11-15). 

33. (Previously presented) A method for reducing the likelihood of cache 
thrashing by software to be executed on a computer system (100) having a cache 
(210, 220), comprising: 

executing the software on the computer system (100) (Fig. 1 and p. 7, 
lines 6-13); 

generating a profile indicating the manner in which the software uses the 

cache (210, 220) (Fig. 2 and p. 8, lines 11-22); 
identifying a portion of the software that exhibits cache thrashing based on 

the profile data (p. 8, lines 11-22 and p. 12, line 23 - p. 13, line 3); 

and 

modifying the identified portion of the software to reduce the likelihood of 
cache thrashing by distributing cache synonyms into detail loops 
configured to allocate the cache synonyms into temporary storage 
areas, sized and located, to prevent cache thrashing (p. 14, line 21- 
p. 15, line 24), 

wherein said modifying occurs before optimizations that are based on an 
architecture of the computer system (100) (p. 8, lines 14-22). 
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34. (Previously presented) A method, as set forth in claim 33, wherein 
modifying the identified portion of the software to reduce the likelihood of cache 
thrashing further comprises: 

identifying a loop in the identified portion of the software (see Fig. 4 and p. 
13, lines 23-25); 

identifying each vector memory reference in the identified loop (see Fig. 4; 
p. 13, lines 5-7; and p. 13, lines 16-19); 

determining dependencies between the vector memory references in the 
identified loop of the software, including determining unidirectional 
and circular dependencies (see Fig. 4 and p. 13, line 19 - p. 14, line 
2); and 

reducing cache thrashing by distributing the vector memory references into 
a plurality of detail loops, wherein the vector memory references 
that have circular dependencies therebetween are included in a 
common detail loop, and wherein the detail loops are ordered 
according to the unidirectional dependencies between the memory 
references (see Fig 4 and p. 14, line 21 - p. 15, line 24). 

35. (Previously presented) A method for reducing the likelihood of cache 
thrashing by software to be executed on a computer system (100) having a cache 
(210, 220), comprising: 

executing the software on the computer system (100) (Fig. 1 and p. 7, 
lines 6-13); 

generating a profile indicating the manner in which the memory references 
of the software use the cache (210, 220) (Fig. 2 and p. 8, lines 11- 

22); 

identifying a portion of the memory references based on the profile, 
wherein the portion of the memory references is determined to 
cause cache thrashing (p. 8, lines 11-22 and p. 12, line 23 - p. 13, 
line 3); and 
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reducing cache thrashing by distributing the portion of the memory 
references into distinct loops that allocate strips of the memory 
references into temporary arrays for execution (p. 14, line 21 - p. 
15, line 24), 

wherein the temporary arrays are configured to simultaneously fit in a 
single cache bank (p. 15, lines 16-20 and p. 17, line 11-15). 

36. (Previously presented) The computer of claim 24 wherein the temporary 
arrays are allocated consecutively such that no temporary array elements are 
cache synonyms (p. 17, lines 10-15). 

37. (Previously presented) The computer of claim 24 wherein the details loops 
are allocated into section loops that cause iterative execution of the detail loops 
based on a size of the strips (p. 17, lines 17-27). 

39. (Previously presented) The method of claim 35 wherein the temporary 
arrays are located and sized to reduce cache thrashing (p. 15, lines 12-24). 

40. (Previously presented) The method of claim 35 wherein the temporary 
arrays are allocated consecutively and iteratively executed by a size of the strips 
(p. 17, lines 10-27). 

41. (Previously presented) A compiler, comprising: 

means for identifying a loop in a program (see Fig. 4 and p. 13, lines 23- 
25); 

means for identifying each vector memory reference in the loop (see Fig. 
4; p. 13, lines 5-7; and p. 13, lines 16-19); 

means for determining dependencies between vector memory references 
in the loop, including determining unidirectional and circular 
dependencies (see Fig. 4 and p. 13, line 19 - p. 14, line 2); 
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means for distributing the vector memory references into a plurality of 
detail loops configured to serially process strips of the vector 
memory references so that thrashing does not occur, wherein the 
vector memory references that have circular dependencies 
therebetween are included in a common detail loop, and wherein 
the detail loops are ordered according to the unidirectional 
dependencies between the memory references (see Fig 4 and p. 
14, line 21 - p. 15, line 24); 

means for determining an execution profile of the program after said 
distributing occurs (p. 8, lines 1-9); and 

means for selectively repeating use of said means for identifying a loop, 
said means for identifying each vector memory reference, said 
means for determining dependencies, said means for distributing 
the vector memory references into a plurality of detail loops, and 
said means for determining an execution profile based on said 
execution profile (p. 8, line 11, p. 9, line 7). 

42. (Original) A compiler, as set forth in claim 41, further comprising means 
for allocating a plurality of temporary storage areas within a cache and 
determining the size of each temporary storage area based on the size of the 
cache and the number of temporary storage areas (p. 15, lines 12-24). 

43. (Original) A compiler, as set forth in claim 41, wherein the means for 
distributing the vector memory references into a plurality of detail loops further 
comprises distributing the vector memory references into a plurality of detail loops 
that each contain at least one vector memory reference that could benefit from 
cache management (see 320 of Fig. 4 and p. 14, lines 4-14). 

44. (Original) A compiler, as set forth in claim 41, further comprising inserting 
cache management instructions into at least one of said detail loops to control 
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movement of data associated with the vector memory reference between a cache 
and main memory (p. 18, lines 6-9 and p. 20, lines 27-37). 

45. (Original) A compiler, as set forth in claim 41, further comprising means 
for inserting prefetch instructions into at least one of said detail loops to control 
movement of data associated with the vector memory reference between a cache 
and main memory (p. 18, lines 9-25). 

46. (Original) A compiler, as set forth in claim 41, further comprising means 
for performing loop unrolling on at least one of said detail loops to control 
movement of data associated with the vector memory reference between a cache 
and main memory (p. 19, line 1 - p. 20, line 37). 

47. (Original) A compiler, as set forth in claim 41, further comprising means 
for inserting at least one of a prefetch instruction and a cache management 
instruction into at least one of said detail loops to control movement of data 
associated with the vector memory reference between a cache and main 
memory, and performing loop unrolling on at least one of said detail loops to 
control movement of data associated with the vector memory reference between 
a cache and main memory (p. 18, line 7 - p. 20, line 37). 
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VL GROUNDS OF REJECTION TO BE REVIEWED ON APPEAL 

Did amendments to claims 1, 9, 17, 24, . 26 (new claim 41), 33 and 35 
introduce new matter? 

Whether claims 1-8 are unpatentable over "Compiler Optimizations for 
Improving Data Locality" by Steve Carr et at. (hereinafter referred to as the "Carr" 
reference) in view of U.S. Pat. No. 6,029,225 ("McGehearty"). 

Whether claims 9-16 are unpatentable over Carr in view of McGehearty. 

Whether claims 17-23 are unpatentable over Carr in view of McGehearty. 

Whether claims 24, 35-37 and 39-40 are unpatentable over Carr in view of 
McGehearty. 

Whether claims 33-34 are unpatentable over Carr in view of McGehearty. 
Whether claims 41-47 are unpatentable over Carr in view of McGehearty. 
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VIL ARGUMENT 

The claims do not stand or fall together. Instead, Appellants present 
separate arguments for various independent and dependent claims. Each of 
these arguments is separately argued below and presented with separate 
headings and sub-headings as required by 37 C.F.R. § 41.37 (c)(1)(vii). 

A. Overview of Carr 

Carr teaches a compiler that optimizes data locality based on a model that 

computes both temporal and spatial reuse of cache lines to find desirable loop 

organizations. In describing the compiler, Carr states: 

The cost model drives the application of compound 
transformations consisting of loop transformation, loop fusion, loop 
distribution, and loop reversal. 

Page 252, Col. L, lines 7-10 

By driving heuristics with a cache model, our algorithms are 
efficient and usually find the best transformations for data locality 
using permutation, fusion and distribution. 

Page 253, Col. L, lines 47-50 

Loop distribution separates independent statements in a single loop 
into multiple loops with identical headers. ...In our system we only 
use loop distribution to indirectly improve reuse by enabling loop 
permutation on a nest that is not permutable. ...We only call 
Distribute if memory order cannot be achieved on a nest and not all 
the inner nests can be fused. ...We thus perform distribution only if 
it combines with permutation to improve the actual LoopCost. 

Page 256, Col. L, line 48 - Col. R, line 33 

Distribution could also be effective if there is no temporal 
locality between partitions and the accessed arrays are too 
numerous to fit in cache at once, or register pressure is a concern. 
We do not address these issues here. 

Page 256, Footnote 3 

Based on the above, Carr suggests that distribution could be effective if 
accessed arrays are too numerous to fit in cache at once, but states "we do not 
address these concerns here," Accordingly, while Carr teaches a compiler that 
uses loop distribution in some circumstances, Carr does not address using 
distribution when accessed arrays are too numerous to fit in cache at once. In 

159991.01/1662.48400 PagO 1 9 Of 50 HP PDNO 200301732-1 



AppL No. 09/785,143 

Appeal Brief dated November 28, 2005 

Reply to final Office action of July 27, 2005 



contrast, Appellants' claimed invention uses distribution when accessed arrays 
are too numerous to fit in cache at once (causing cache thrashing). 
B. Overview of McGehearty 

McGehearty teaches a cache collision avoidance mechanism. In 

describing the mechanism, McGehearty states: 

[i]f the source and destination addresses have different 
cache addresses then the data flow into and out of the cache 
operates in an enhanced mode. If the source and destination 
addresses have nearly matching cache addresses then the data 
flow into and out of the cache operates in a near cache collision 
mode. If the source and destination addresses have exactly 
matching addresses then the data flow into and out of the cache 
operates in an exact cache collision avoidance mode. 

Col. 2, lines 19-27 

The enhanced mode of cache bank conflict avoidance 
carefully orders loads and stores so that loads to one memory bank 
are performed on the same clock cycles as the stores to the other 
memory bank. After a group of loads and stores are completed, 
then operations for each bank are switched so that a group of 
stores are performed on one bank and a group of loads are 
performed to the other bank. 

Col. 2, lines 27-39 

The near cache collision avoidance mode determines the 
relative locations of the source and destination addresses within the 
cache. If the source address is slightly after the destination, then 
the enhanced mode will function properly. If the destination 
address is slightly after the source address, the groups of cache 
lines is loaded into registers, and then the registers are stored to 
memory without any interleaving of other loads and stores. 

Col. 2, lines 40-47 

The exact cache collision avoidance mode restructures the 
loop of moving data to form a series of loads to get several cache 
lines staged for loading, each element of data is not only moved 
into cache, but into registers. This is pipelined so that after this 
initial set of loads is performed, then additional loads are 
interleaved with non-cache conflicting stores to move new values 
into memory. By separating the time of loads and stores for 
matching cache lines, full pipelining and multi-entry cache benefits 
is obtained. 

Col. 2, lines 48-57 
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While McGehearty does teach a mechanism for cache collision avoidance, 
the mechanism is not based on "detail loops" as required in Appellants' claimed 
invention. Instead, McGehearty implements a cache manager 19 that monitors 
and controls the flow of data into and out of the cache 10. When data is being 
loaded into the cache 10 from memory 15, the cache manager 19 compares (by 
subtraction) the memory source address and the memory destination address to 
determine whether to implement the enhanced mode, the near cache collision 
mode or the exact cache collision mode. In contrast, Appellants' claimed 
invention reduces cache thrashing (collisions) by distributing vector memory 
references into detail loops. 

C. Did amendments to claims 1, 9, 17, 24, 26 (new claim 41), 33 
and 35 introduce new matter? 

As shown in the "SUMMARY OF THE CLAIMED SUBJECT MATTER" 

section, the limitations of claims 1, 9, 17, 24, 26 (new claim 41), 33 and 35 are 

supported by Appellants' specification. Accordingly, the Examiner improperly 

objected to amendments to claims 1, 9, 17, 24, 26 (new claim 41), 33 and 35 as 

introducing new matter under 35 U.S.C § 132 and 37 C.F.R. § 1.121(f). 

D. Claims 1-8, witli claim 1 representing this group 

Claim 1 , in part, requires "determining dependencies between vector 
memory references in [a] loop, including determining unidirectional and circular 
dependencies." Claim 1 further requires "distributing the vector memory 
references into a plurality of detail loops configured to allocate the vector memory 
references into a plurality of temporary arrays, sized and located, so that none of 
the vector memory references are cache synonyms." Claim 1 further requires 
"analyzing an execution profile of the program after said distributing" and "based 
on the execution profile, determining whether to repeat... said determining 
dependencies, and said distributing." 

The Examiner's obviousness rejection of claim 1 in view of Carr and 
McGehearty is improper for at least three reasons. First, the Examiner admits 
that Carr does not disclose Appellants' claimed "allocat[ing] the vector memory 
references into a plurality of temporary arrays, sized and located, so that none of 
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the vector memory references are cache synonyms," and relies on McGehearty 
as teaching this limitation (see final Office action, page 4, third paragraph). 
However, McGehearty does not appear to teach or suggest "detail loops" are 
used to "allocate the vector memory references into a plurality of temporary 
arrays" as required in claim 1. Instead, McGehearty teaches a cache manager 
19 that monitors and controls the flow of data into and out of the cache 1 1 . "To 
avoid cache collisions that occur when there is an exact mapping of source and 
destination addresses, the manager 19 loads the data from the memory that 
would fill multiple cache lines into different registers, via the cache, to start a 
pipelined copy operation" (See Fig. 1 and Col. 6, lines 23-27). Appellants cannot 
find a teaching or suggestion in McGehearty that "detail loops... allocate the 
vector memory references into a plurality of temporary arrays" as required in 
claim 1. 

Second, there is no clear motivation to combine Carr with McGehearty "to 
avoid cache collisions" as suggested by the Examiner. As previously discussed, 
Carr optimizes data locality based on a cost model. "The model computes both 
temporal and spatial reuse of cache lines to find desirable loop organizations" 
(Page 252, Col. L, lines 7-8). In contrast, McGehearty's cache collision 
avoidance mode restructures loops when a cache manager 19 determines that 
data to be stored in a cache 11 will cause cache collisions (Col. 3, lines 50-55 
and col. 6, lines 23-32). Because Carr's algorithm is intended to optimize data 
locality and loop organizations, one of ordinary skill in the art would not think to 
combine with Carr with McGehearty's loop restructuring mechanism. In other 
words, McGehearty's loop restructuring mechanism would not be needed if loops 
have been organized to optimize data locality according to Carr. Likewise, if 
McGehearty's loop restructuring mechanism is implemented, there is no apparent 
need to optimize data locality and loop organization according to Carr. 

Third, none of the references cited by the Examiner, nor combinations of 
the references, teaches or suggests "analyzing an execution profile of the 
program after said distributing" and "based on the execution profile, determining 
whether to repeat... said determining dependencies, and said distributing" as 
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required in claim 1 . On the contrary, Carr states "our approach only performs one 
evaluation step because it directly determines the best loop permutation" (Page 
253, Col. L, lines 22-24). For at least these reasons, Appellants submit that the 
Examiner improperly rejected claim 1 and that claims 1-8 should be allowed. 
E. Claims 9-16, with claim 9 representing this group 
Claim 9, in part, requires "distributing the vector memory references into a 
plurality of detail loops that serially proceed through strips of the vector memory 
references and store the strips in temporary arrays so that none of the vector 
memory references are cache synonyms". Claim 9 further requires "said 
distributing the vector memory references into a plurality of detail loops is 
performed by a first computer for execution by a second computer." 

The Examiner's obviousness rejection of claim 9 in view of Carr and 
McGehearty is improper for at least three reasons. First, the Examiner admits 
that Carr does not disclose Appellants' claimed "a plurality of detail loops that 
serially proceed through strips of vector memory references and store the strips in 
temporary arrays so that none of the vector memory references are cache 
synonyms" and relies on McGehearty as teaching this limitation (see Final Office 
Action, page 8, first and second paragraphs). However, McGehearty does not 
appear to teach or suggest " detail loops that serially proceed through strips of 
vector memory references and store the strips in temporary arrays so that none of 
the vector memory references are cache synonyms" as required in claim 9. 
Instead, McGehearty teaches "[t]o avoid cache collisions that occur when there is 
an exact mapping of source and destination addresses, the [cache] manager 19 
loads the data from the memory that would fill multiple cache lines into different 
registers, via the cache, to start a pipelined copy operation" (See Fig. 1 and Col. 
6, lines 23-27). Thus, McGehearty appears to teach a cache manager 19 rather 
than Appellants' claimed "detail loops" is responsible for cache collision 
avoidance. 

Second, as previously discussed with respect to claim 1, there is no clear 
motivation to combine Carr with McGehearty "to avoid cache collisions" as 
suggested by the Examiner. Third, none of the references cited by the Examiner, 
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nor combinations of the references, appear to teach or suggest "said distributing 
the vector memory references into a plurality of detail loops is performed by a first 
computer for execution by a second computer" as required in claim 9. For at 
least these reasons, Appellants submit that the Examiner improperly rejected 
claim 9 and that claims 9-16 should be allowed. 

F. Claims 17-23, with claim 17 representing this group 

Claim 17, in part, requires "distributing the vector memory references into 
a plurality of detail loops in response to cache behavior and the dependencies 
between the vector memory references in the loop, wherein the detail loops 
cause storage of the vector memory references in temporary arrays that are 
allocated consecutively so that no temporary [array] elements are cache 
synonyms." Claim 17 further requires "said distributing the vector memory 
references into a plurality of detail loops produce[s] code that is substantially 
independent of a computer architecture" and "performing code optimizations that 
are dependent on a computer architecture after said distributing." 

The Examiner's obviousness rejection of claim 17 in view of Carr and 
McGehearty is improper for at least three reasons. First, the Examiner 
recognizes that Carr does not teach Appellants' claimed "storage of the vector 
memory references in temporary arrays that are allocated consecutively so that 
no temporary [array] elements are cache synonyms" and relies on McGehearty 
as teaching this limitation (see final Office action, page 10, first and second 
paragraphs). However, McGehearty does not appear to teach or suggest " detail 
loops [that] cause storage of the vector memory references in temporary arrays 
that are allocated consecutively so that no temporary [array] elements are cache 
synonyms" as required in claim 17. Instead of Appellants' claimed "detail loops," 
McGehearty relies on a cache manager 19 that loads data from the memory that 
would fill multiple cache lines into different registers, via the cache, to start a 
pipelined copy operation. 

Second, as previously discussed with respect to claim 1, there is no clear 
motivation to combine Carr with McGehearty "to avoid cache collisions" as 
suggested by the Examiner. Third, none of the references cited by the Examiner, 
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nor combinations of the references, appear to teach or suggest "said distributing 

the vector memory references into a plurality of detail loops produce[s] code that 

is substantially independent of a computer architecture" and "performing code 

optimizations that are dependent on a computer architecture after said 

distributing." For at least these reasons, Appellants submit that the Examiner 

improperly rejected claim 17 and that claims 17-23 should be allowed. 

G. Claims 24, 35-37 and 39-40, with claim 35 representing this 
group 

Claim 35, in part, requires "identifying a portion of the memory references 
based on the profile, wherein the portion of the memory references is determined 
to cause cache thrashing." Claim 35 further requires "reducing cache thrashing 
by distributing the portion of the memory references into distinct loops that 
allocate strips of the memory references into temporary arrays for execution." 
Claim 35 further required "the temporary arrays are configured to simultaneously 
fit in a single cache bank." 

The Examiner's obviousness rejection of claim 35 in view of Carr and 
McGehearty is improper for at least three reasons. First, none of the references 
cited by the Examiner, nor combinations of the references, teaches or suggests 
"identifying a portion of the memory references... determined to cause cache 
thrashing" and "reducing cache thrashing by distributing the portion of the 
memory references into distinct loops that allocate strips of the memory 
references into temporary arrays for execution" as required in claim 35. While 
McGehearty teaches a cache collision avoidance mode that "restructures [a] loop 
of moving data," McGehearty does not teach or suggest Appellants' claimed 
"distributing the portion of the memory references into distinct loops that allocate 
strips of the memory references into temporary arrays for execution." 

Second, as previously discussed with respect to claim 1, there is no clear 
motivation to combine Carr with McGehearty "to avoid cache collisions" as 
suggested by the Examiner. Third, none of the references cited by the Examiner, 
nor combinations of the references, appear to teach or suggest "the temporary 
arrays are configured to simultaneously fit in a single cache bank" as required in 
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claim 35. The Examiner cites Carr as teaching "knowledge of the cache size, 
associativity, and replacement policy is essential" (see final Office action, page 5, 
last paragraph). However, the Carr reference still does not teach or suggest 
"temporary arrays are configured to simultaneously fit in a single cache bank" as 
required in claim 35. McGehearty is likewise deficient in this regard. For at least 
these reasons, Appellants submit that the Examiner improperly rejected claim 35 
and that claims 24, 35-37 and 39-40 should be allowed. 

H. Claims 33 and 34, with claim 33 representing this group 

Claim 33, in part, requires "modifying the identified portion of the software 
to reduce the likelihood of cache thrashing by distributing cache synonyms into 
detail loops configured to allocate the cache synonyms into temporary storage 
areas, sized and located, to prevent cache thrashing." Claim 33 further requires 
"said modifying occurs before optimizations that are based on an architecture of 
the computer system." 

The Examiner's obviousness rejection of claim 33 in view of Carr and 
McGehearty is improper for at least three reasons. First, the Examiner admits 
that Carr does not teach "distributing cache synonyms into detail loops configured 
to allocate the cache synonyms into temporary storage areas, sized and located, 
to prevent cache thrashing" and relies on McGehearty as teaching this limitation. 
However, McGehearty does not appear to teach or suggest " detail loops 
configured to allocate the cache synonyms into temporary storage areas, sized 
and located, to prevent cache thrashing" as required in claim 33. Instead of 
Appellants' claimed "detail loops," McGehearty relies on a cache manager 19 that 
loads data from the memory that would fill multiple cache lines into different 
registers, via the cache, to start a pipelined copy operation. 

Second, as previously discussed with respect to claim 1, there is no clear 
motivation to combine Carr with McGehearty "to avoid cache collisions" as 
suggested by the Examiner. Third, none of the references cited by the Examiner, 
nor combinations of the references, appear to teach or suggest "said [distributing 
cache synonyms into detail loops] occurs before optimizations that are based on 
an architecture of the computer system." For at least these reasons. Appellants 
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submit that the Examiner improperly rejected claim 33 and that claims 33 and 34 
should be allowed. 

I. Claims 41-47, with claim 41 representing this group 

Claim 41, in part, requires "means for distributing the vector memory 
references into a plurality of detail loops configured to serially process strips of 
the vector memory references so that thrashing does not occur." Claim 41 further 
requires "means for determining an execution profile of the program after said 
distributing occurs" and "means for selectively repeating... said means for 
distributing the vector memory references into a plurality of detail loops." 

The Examiner improperly rejected the limitations of claim 41 in view of 
Carr and McGehearty for at least three reasons. First, the Examiner recognizes 
that Carr does not teach Appellants' claimed "distributing the vector memory 
references into a plurality of detail loops configured to serially process strips of 
the vector memory references so that thrashing does not occur" and relies on 
McGehearty as teaching this limitation (see final Office action, page 8, first 
paragraph). However, McGehearty does not appear to teach or suggest " detail 
loops configured to serially process strips of the vector memory references so 
that thrashing does not occur" as required in claim 41. Instead of Appellants* 
claimed "detail loops," McGehearty relies on a cache manager 19 that loads data 
from the memory that would fill multiple cache lines into different registers, via the 
cache, to start a pipelined copy operation. 

Second, as previously discussed with respect to claim 1, there is no clear 
motivation to combine Carr with McGehearty "to avoid cache collisions" as 
suggested by the Examiner. Third, none of the references cited by the Examiner, 
nor combinations of the references, appear to teach or suggest "selectively 
repeating... distributing the vector memory references into a plurality of detail 
loops." On the contrary, Carr states "our approach only performs one evaluation 
step because it directly determines the best loop permutation" (Page 253, Col. L, 
lines 22-24). For at least these reasons. Appellants submit that the Examiner 
improperly rejected the limitations of claim 41 and that claims 41-47 should be 
allowed. 
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VIII. CONCLUSION 

For the reasons stated above, Appellants respectfully submit that the 
Examiner erred in rejecting all pending claims. It is believed that no extensions 
of time or fees are required, beyond those that may otherwise be provided for in 
documents accompanying this paper. However, In the event that additional 
extensions of time are necessary to allow consideration of this paper, such 
extensions are hereby petitioned under 37 C.F.R. § 1.136(a), and any fees 
required (including fees for net addition of claims) are hereby authorized to be 
charged to Hewlett-Packard Development Company's Deposit Account No. 08- 
2025. 

Respectfully submitted, 
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IX. CLAIMS APPENDIX 

CLAIM SET 1 (CLAIMS 1-24 and 33-47) 

1 (Previously presented) A method, comprising: 

identifying a loop in a program; 

identifying each vector memory reference in the loop; 

determining dependencies between vector memory references in the 
loop, including determining unidirectional and circular dependencies; 

distributing the vector memory references into a plurality of detail loops 
configured to allocate the vector memory references into a plurality 
of temporary arrays, sized and located, so that none of the vector 
memory references are cache synonyms, wherein the vector 
memory references that have circular dependencies therebetween 
are included in a common detail loop, and wherein the detail loops 
are ordered according to the unidirectional dependencies between 
the memory references; 

analyzing an execution profile of the program after said distributing; and 

based on the execution profile, determining whether to repeat said 
identifying a loop, said identifying each vector memory reference, 
said determining dependencies, and said distributing. 

2. (Original) A method, as set forth in claim 1 , further comprising allocating a 
plurality of temporary storage areas within a cache and determining the size of 
each temporary storage area based on the size of the cache and the number of 
temporary storage areas. 

3. (Original) A method, as set forth in claim 1, further comprising at least one 
section loop including the plurality of detail loops. 

4. (Original) A method, as set forth in claim 1, wherein distributing the vector 
memory references into a plurality of detail loops further comprises distributing 
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the vector memory references into a plurality of detail loops that each contain at 
least one vector memory reference that could benefit from cache management. 

5. (Original) A method, as set forth in claim 1, further comprising inserting 
cache management instructions into at least one of said detail loops to control 
movement of data associated with the vector memory reference between a cache 
and main memory. 

6. (Original) A method, as set forth in claim 1, further comprising inserting 
prefetch instructions into at least one of said detail loops to control movement of 
data associated with the vector memory reference between a cache and main 
memory. 

7. (Original) A method, as set forth in claim 1, further comprising performing 
loop unrolling on at least one of said detail loops to control movement of data 
associated with the vector memory reference between a cache and main 
memory. 

8. (Original) A method, as set forth in claim 1, further comprising inserting at 
least one of a prefetch instruction and a cache management instruction into at 
least one of said detail loops to control movement of data associated with the 
vector memory reference between a cache and main memory, and performing 
loop unrolling on at least one of said detail loops to control movement of data 
associated with the vector memory reference between a cache and main 
memory. 

9. (Previously presented) A method, comprising: 
identifying a loop in a program; 

identifying each vector memory reference in the loop; 

determining dependencies between vector memory references in the 
loop; and 
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distributing the vector memory references into a plurality of detail loops 
that serially proceed through strips of the vector memory references 
and store the strips in temporary arrays so that none of the vector 
memory references are cache synonyms, wherein the vector 
memory references that have dependencies therebetween are 
included in a common detail loop; 

wherein said distributing the vector memory references into a plurality of 
detail loops is performed by a first computer for execution by a 
second computer. 

10. (Original) A method, as set forth in claim 9, further comprising allocating a 
plurality of temporary storage areas within a cache and determining the size of 
each temporary storage area based on the size of the cache and the number of 
temporary storage areas. 

11. (Original) A method, as set forth in claim 9, further comprising at least one 
section loop including the plurality of detail loops. 

12. (Original) A method, as set forth in claim 9, wherein distributing the vector 
memory references into a plurality of detail loops further comprises distributing 
the vector memory references into a plurality of detail loops that each contain at 
least one vector memory reference that could benefit from cache management. 

13. (Original) A method, as set forth in claim 9, further comprising inserting 
cache management instructions into at least one of said detail loops to control 
movement of data associated with the vector memory reference between a cache 
and main memory. 

14. (Original) A method, as set forth in claim 9, further comprising inserting 
prefetch instructions into at least one of said detail loops to control movement of 
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data associated with the vector memory reference between a cache and main 
memory. 

15. (Original) A method, as set forth in claim 9, further comprising performing 
loop unrolling on at least one of said detail loops to control movement of data 
associated with the vector memory reference between a cache and main 
memory. 

16. (Original) A method, as set forth in claim 9, further comprising inserting at 
least one of a prefetch instruction and a cache management instruction into at 
least one of said detail loops to control movement of data associated with the 
vector memory reference between a cache and main memory, and performing 
loop unrolling on at least one of said detail loops to control movement of data 
associated with the vector memory reference between a cache and main 
memory. 

17. (Previously presented) A method, comprising: 
identifying a loop in a program; 

identifying each vector memory reference in the loop; 

determining dependencies between vector memory references in the loop; 

distributing the vector memory references into a plurality of detail loops in 
response to cache behavior and the dependencies between the 
vector memory references in the loop, wherein the detail loops 
cause storage of the vector memory references in temporary arrays 
that are allocated consecutively so that no temporary arrays 
elements are cache synonyms, 

wherein said identifying a loop, said identifying each vector memory 
reference, said determining dependencies between vector memory 
references and said distributing the vector memory references into 
a plurality of detail loops produce code that is substantially 
independent of a computer architecture; and 
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performing code optimizations that are dependent on a computer 
arcliitecture after said distributing. 

18. (Original) A method, as set forth in claim 17, wherein distributing the 
vector memory references further comprises distributing the vector memory 
references into the plurality of detail loops with each loop having at least one of 
the identified vector memory references. 

19. (Original) A method, as set forth in claim 17, further comprising 
determining dependencies between vector memory references in the loop, and 
wherein distributing the loop includes distributing the vector memory references 
into the plurality of detail loops, wherein the vector memory references that have 
circular dependencies therebetween are included in a common detail loop. 

20. (Original) A method, as set forth in claim 17, further comprising inserting 
cache management instructions into at least one of said detail loops to control 
movement of data associated with the vector memory reference between a cache 
and main memory. 

21. (Original) A method, as set forth in claim 17, further comprising inserting 
prefetch instructions into at least one of said detail loops to control movement of 
datia associated with the vector memory reference between a cache and main 
memory. 

22. (Original) A method, as set forth in claim 17, further comprising performing 
loop unrolling on at least one of said detail loops to control movement of data 
associated with the vector memory reference between a cache and main 
memory. 



23. (Original) A method, as set forth in claim 17, further comprising inserting 
at least one of a prefetch instruction and a cache management instruction into at 

159991.01/1662.48400 PagO 33 Of 50 HP PDNO 200301732-1 



AppL No. 09/785,143 

Appeal Brief dated November 28, 2005 

Reply to final Office action of July 27, 2005 

least one of said detail loops to control nriovement of data associated with the 
vector memory reference between a cache and main memory, and performing 
loop unrolling on at least one of said detail loops to control movement of data 
associated with the vector memory reference between a cache and main 
memory. 

24. (Previously presented) A computer programmed to perform a method, 
comprising: 

identifying a loop in a program; 

identifying each vector memory reference in the loop; 

determining dependencies between vector memory references in the 
loop; and 

distributing the vector memory references into a plurality of detail loops 
configured to retrieve strips of the vector memory references and 
store the strips in temporary arrays, wherein the vector memory 
references that have circular dependencies therebetween are 
included in a common detail loop, 

wherein the temporary arrays are configured to simultaneously fit in a 
single cache bank. 

25-32. (Canceled). 

33. (Previously presented) A method for reducing the likelihood of cache 
thrashing by software to be executed on a computer system having a cache, 
comprising: 

executing the software on the computer system; 

generating a profile indicating the manner in which the software uses the 
cache; 

identifying a portion of the software that exhibits cache thrashing based on 
the profile data; and 



159991.01/1662.48400 



Page 34 of 50 



HP PDNO 200301732-1 



AppL No. 09/785,143 

Appeal Brief dated November 28, 2005 

Reply to final Office action of July 27, 2005 



modifying the identified portion of the software to reduce the likelihood of 
cache thrashing by distributing cache synonyms into detail loops 
configured to allocate the cache synonyms into temporary storage 
areas, sized and located, to prevent cache thrashing, 

wherein said modifying occurs before optimizations that are based on an 
architecture of the computer system. 

34. (Previously presented) A method, as set forth in claim 33, wherein 
modifying the identified portion of the software to reduce the likelihood of cache 
thrashing further comprises: 

identifying a loop in the identified portion of the software; 

identifying each vector memory reference in the identified loop; 

determining dependencies between the vector memory references in the 
identified loop of the software, including determining unidirectional 
and circular dependencies; and 

reducing cache thrashing by distributing the vector memory references into 
a plurality of detail loops, wherein the vector memory references 
that have circular dependencies therebetween are included in a 
common detail loop, and wherein the detail loops are ordered 
according to the unidirectional dependencies between the memory 
references. 

35. (Previously presented) A method for reducing the likelihood of cache 
thrashing by software to be executed on a computer system having a cache, 
comprising: 

executing the software on the computer system; 

generating a profile indicating the manner in which the memory references 

of the software use the cache; 
identifying a portion of the memory references based on the profile, 

wherein the portion of the memory references is determined to 

cause cache thrashing; and 
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reducing cache thrashing by distributing the portion of the memory 
references into distinct loops that allocate strips of the memory 
references into temporary arrays for execution, 

wherein the temporary arrays are configured to simultaneously fit in a 
single cache bank. 

36. (Previously presented) The computer of claim 24 wherein the temporary 
arrays are allocated consecutively such that no temporary array elements are 
cache synonyms. 

37. (Previously presented) The computer of claim 24 wherein the details loops 
are allocated into section loops that cause iterative execution of the detail loops 
based on a size of the strips. 

38. (Previously presented) The program storage medium of claim 25 wherein 
the temporary arrays are allocated consecutively such that no temporary array 
elements are cache synonyms. 

39. (Previously presented) The method of claim 35 wherein the temporary 
arrays are located and sized to reduce cache thrashing. 

40. (Previously presented) The method of claim 35 wherein the temporary 
arrays are allocated consecutively and iteratively executed by a size of the strips. 

41 . (Previously presented) A compiler, comprising: 
means for identifying a loop in a program; 

means for identifying each vector memory reference in the loop; 

means for determining dependencies between vector memory references 

in the loop, including determining unidirectional and circular 

dependencies; 
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means for distributing the vector memory references into a plurality of 
detail loops configured to serially process strips of the vector 
memory references so that thrashing does not occur, wherein the 
vector memory references that have circular dependencies 
therebetween are included in a common detail loop, and wherein 
the detail loops are ordered according to the unidirectional 
dependencies between the memory references; 

means for determining an execution profile of the program after said 
distributing occurs; and 

means for selectively repeating use of said means for identifying a loop, 
said means for identifying each vector memory reference, said 
means for determining dependencies, said means for distributing 
the vector memory references into a plurality of detail loops, and 
said means for determining an execution profile based on said 
execution profile. 

42. (Original) A compiler, as set forth in claim 41, further comprising means 
for allocating a plurality of temporary storage areas within a cache and 
determining the size of each temporary storage area based on the size of the 
cache and the number of temporary storage areas. 

43. (Original) A compiler, as set forth in claim 41, wherein the means for 
distributing the vector memory references into a plurality of detail loops further 
comprises distributing the vector memory references into a plurality of detail loops 
that each contain at least one vector memory reference that could benefit from 
cache management. 

44. (Original) A compiler, as set forth in claim 41, further comprising inserting 
cache management instructions into at least one of said detail loops to control 
movement of data associated with the vector memory reference between a cache 
and main memory. 
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45. (Original) A compiler, as set forth in claim 41, further comprising means 
for inserting prefetch instructions into at least one of said detail loops to control 
movement of data associated with the vector memory reference between a cache 
and main memory. 

46. (Original) A compiler, as set forth in claim 41, further comprising means 
for performing loop unrolling on at least one of said detail loops to control 
movement of data associated with the vector memory reference between a cache 
and main memory. 

47. (Original) A compiler, as set forth in claim 41, further comprising means 
for inserting at least one of a prefetch instruction and a cache management 
instruction into at least one of said detail loops to control movement of data 
associated with the vector memory reference between a cache and main 
memory, and performing loop unrolling on at least one of said detail loops to 
control movement of data associated with the vector memory reference between 
a cache and main memory. 
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CLAIM SET 2 (CLAIMS 1-40) 

1. (Previously presented) A method, comprising: 
identifying a loop in a program; 

identifying each vector memory reference in the loop; 

determining dependencies between vector memory references in the 
loop, including determining unidirectional and circular dependencies; 

distributing the vector memory references into a plurality of detail loops 
configured to allocate the vector memory references into a plurality 
of temporary arrays, sized and located, so that none of the vector 
memory references are cache synonyms, wherein the vector 
memory references that have circular dependencies therebetween 
are included in a common detail loop, and wherein the detail loops 
are ordered according to the unidirectional dependencies between 
the memory references; 

analyzing an execution profile of the program after said distributing; and 

based on the execution profile, determining whether to repeat said 
identifying a loop, said identifying each vector memory reference, 
said determining dependencies, and said distributing. 

2. (Original) A method, as set forth in claim 1, further comprising allocating a 
plurality of temporary storage areas within a cache and determining the size of 
each temporary storage area based on the size of the cache and the number of 
temporary storage areas. 

3. (Original) A method, as set forth in claim 1, further comprising at least one 
section loop including the plurality of detail loops. 

4. (Original) A method, as set forth in claim 1, wherein distributing the vector 
memory references into a plurality of detail loops further comprises distributing 
the vector memory references into a plurality of detail loops that each contain at 
least one vector memory reference that could benefit from cache management. 
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5. (Original) A method, as set forth in claim 1, further comprising inserting 
cache management instructions into at least one of said detail loops to control 
movement of data associated with the vector memory reference between a cache 
and main memory. 

6. (Original) A method, as set forth in claim 1, further comprising inserting 
prefetch instructions into at least one of said detail loops to control movement of 
data associated with the vector memory reference between a cache and main 
memory. 

7. (Original) A method, as set forth in claim 1, further comprising performing 
loop unrolling on at least one of said detail loops to control movement of data 
associated with the vector memory reference between a cache and main 
memory. 

8. (Original) A method, as set forth in claim 1, further comprising inserting at 
least one of a prefetch instruction and a cache management instruction into at 
least one of said detail loops to control movement of data associated with the 
vector memory reference between a cache and main memory, and performing 
loop unrolling on at least one of said detail loops to control movement of data 
associated with the vector memory reference between a cache and main 
memory. 

9. (Previously presented) A method, comprising: 
identifying a loop in a program; 

identifying each vector memory reference in the loop; 

determining dependencies between vector memory references in the 
loop; and 

distributing the vector memory references into a plurality of detail loops 
that serially proceed through strips of the vector memory references 
and store the strips in temporary arrays so that none of the vector 
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memory references are cache synonyms, wherein the vector 
memory references that have dependencies therebetween are 
included in a common detail loop; 
wherein said distributing the vector memory references into a plurality of 
detail loops is performed by a first computer for execution by a 
second computer. 

10. (Original) A method, as set forth in claim 9, further comprising allocating a 
plurality of temporary storage areas within a cache and determining the size of 
each temporary storage area based on the size of the cache and the number of 
temporary storage areas. 

11. (Original) A method, as set forth in claim 9, further comprising at least one 
section loop including the plurality of detail loops. 

12. (Original) A method, as set forth in claim 9, wherein distributing the vector 
memory references into a plurality of detail loops further comprises distributing 
the vector memory references into a plurality of detail loops that each contain at 
least one vector memory reference that could benefit from cache management. 

13. (Original) A method, as set forth in claim 9, further comprising inserting 
cache management instructions into at least one of said detail loops to control 
movement of data associated with the vector memory reference between a cache 
and main memory. 

14. (Original) A method, as set forth in claim 9, further comprising inserting 
prefetch instructions into at least one of said detail loops to control movement of 
data associated with the vector memory reference between a cache and main 
memory. 
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15. (Original) A method, as set forth in claim 9, further comprising peri'orming 
loop unrolling on at least one of said detail loops to control movement of data 
associated with the vector memory reference between a cache and main 
memory. 

16. (Original) A method, as set forth in claim 9, further comprising inserting at 
least one of a prefetch instruction and a cache management instruction into at 
least one of said detail loops to control movement of data associated with the 
vector memory reference between a cache and main memory, and performing 
loop unrolling on at least one of said detail loops to control movement of data 
associated with the vector memory reference between a cache and main 
memory. 

17. (Previously presented) A method, comprising: 
identifying a loop in a program; 

identifying each vector memory reference in the loop; 

determining dependencies between vector memory references in the loop; 

distributing the vector memory references into a plurality of detail loops in 
response to cache behavior and the dependencies between the 
vector memory references in the loop, wherein the detail loops 
cause storage of the vector memory references in temporary arrays 
that are allocated consecutively so that no temporary arrays 
elements are cache synonyms, 

wherein said identifying a loop, said identifying each vector memory 
reference, said determining dependencies between vector memory 
references and said distributing the vector memory references into 
a plurality of detail loops produce code that is substantially 
independent of a computer architecture; and 

performing code optimizations that are dependent on a computer 
architecture after said distributing. 



159991.01/1662.48400 



Page 42 of 50 



HP PDNO 200301732-1 



AppL No- 09/785,143 

Appeal Brief dated November 28, 2005 

Reply to final Office action of July 27, 2005 

18. (Original) A method, as set forth in claim 17, wherein distributing the 
vector memory references further comprises distributing the vector memory 
references into the plurality of detail loops with each loop having at least one of 
the identified vector memory references. 

19. (Original) A method, as set forth in claim 17, further comprising 
determining dependencies between vector memory references in the loop, and 
wherein distributing the loop includes distributing the vector memory references 
into the plurality of detail loops, wherein the vector memory references that have 
circular dependencies therebetween are included in a common detail loop. 

20. (Original) A method, as set forth in claim 17, further comprising inserting 
cache management instructions into at least one of said detail loops to control 
movement of data associated with the vector memory reference between a cache 
and main memory. 

21. (Original) A method, as set forth in claim 17, further comprising inserting 
prefetch instructions into at least one of said detail loops to control movement of 
data associated with the vector memory reference between a cache and main 
memory, 

22. (Original) A method, as set forth in claim 17, further comprising performing 
loop unrolling on at least one of said detail loops to control movement of data 
associated with the vector memory reference between a cache and main 
memory. 

23. (Original) A method, as set forth in claim 17, further comprising inserting 
at least one of a prefetch instruction and a cache management instruction into at 
least one of said detail loops to control movement of data associated with the 
vector memory reference between a cache and main memory, and performing 
loop unrolling on at least one of said detail loops to control movement of data 
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associated with tine vector memory reference between a cache and main 
memory. 

24. (Previously presented) A computer programmed to perform a method, 
comprising: 

identifying a loop in a program; 
identifying each vector memory reference in the loop; 
determining dependencies between vector memory references in the 
loop; and 

distributing the vector memory references into a plurality of detail loops 
configured to retrieve strips of the vector memory references and 
store the strips in temporary arrays, wherein the vector memory 
references that have circular dependencies therebetween are 
included in a common detail loop, 

wherein the temporary arrays are configured to simultaneously fit in a 
single cache bank. 

25. (Previously presented) A program storage medium encoded with 
instructions that, when executed by a computer, perform a method, comprising: 

identifying a loop in a program; 
identifying each vector memory reference in the loop; 
determining dependencies between vector niemory references in the 
loop; and 

generating an expanded code of the program by distributing the vector 
memory references into a plurality of detail loops configured to 
allocate the vector memory references into temporary arrays that 
avoid cache synonyms, wherein the vector memory references that 
have circular dependencies therebetween are included in a 
common detail loop, 

wherein the expanded code is substantially independent of computer 
architectures. 
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26. (Previously presented) A compiler, comprising: 
means for identifying a loop in a program; 

means for identifying each vector memory reference in the loop; 

means for determining dependencies between vector memory references 

in the loop, including determining unidirectional and circular 

dependencies; 

means for distributing the vector memory references into a plurality of 
detail loops configured to serially process strips of the vector 
memory references so that thrashing does not occur, wherein the 
vector memory references that have circular dependencies 
therebetween are included in a common detail loop, and wherein 
the detail loops are ordered according to the unidirectional 
dependencies between the memory references; 

means for determining an execution profile of the program after said 
distributing occurs; and 

means for selectively repeating use of said means for identifying a loop, 
said means for identifying each vector memory reference, said 
means for determining dependencies, said means for distributing 
the vector memory references into a plurality of detail loops, and 
said means for determining an execution profile based on said 
execution profile. 

27. (Original) A compiler, as set forth in claim 26, further comprising means 
for allocating a plurality of temporary storage areas within a cache and 
determining the size of each temporary storage area based on the size of the 
cache and the number of temporary storage areas. 

28. (Original) A compiler, as set forth in claim 26, wherein the means for 
distributing the vector memory references into a plurality of detail loops further 
comprises distributing the vector memory references into a plurality of detail loops 
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that each contain at least one vector memory reference that could benefit from 
cache management. 

29. (Original) A compiler, as set forth in claim 26, further comprising inserting 
cache management instructions into at least one of said detail loops to control 
movement of data associated with the vector memory reference between a cache 
and main memory. 

30. (Original) A compiler, as set forth in claim 26, further comprising means 
for inserting prefetch instructions into at least one of said detail loops to control 
movement of data associated with the vector memory reference between a cache 
and main memory. 

31. (Original) A compiler, as set forth in claim 26, further comprising means 
for performing loop unrolling on at least one of said detail loops to control 
movement of data associated with the vector memory reference between a cache 
and main memory. 

32. (Original) A compiler, as set forth in claim 26, further comprising means 
for inserting at least one of a prefetch instruction and a cache management 
instruction into at least one of said detail loops to control movement of data 
associated with the vector memory reference between a cache and main 
memory, and performing loop unrolling on at least one of said detail loops to 
control movement of data associated with the vector memory reference between 
a cache and main memory. 

33. (Previously presented) A method for reducing the likelihood of cache 
thrashing by software to be executed on a computer system having a cache, 
comprising: 

executing the software on the computer system; 
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generating a profile indicating the manner in which the software uses the 
cache; 

identifying a portion of the software that exhibits cache thrashing based on 
the profile data; and 

modifying the identified portion of the software to reduce the likelihood of 
cache thrashing by distributing cache synonyms into detail loops 
configured to allocate the cache synonyms into temporary storage 
areas, sized and located, to prevent cache thrashing, 

wherein said modifying occurs before optimizations that are based on an 
architecture of the computer system. 

34. (Previously presented) A method, as set forth in claim 33, wherein 
modifying the identified portion of the software to reduce the likelihood of cache 
thrashing further comprises: 

identifying a loop in the identified portion of the software; 

identifying each vector memory reference in the identified loop; 

determining dependencies between the vector memory references in the 
identified loop of the software, including determining unidirectional 
and circular dependencies; and 

reducing cache thrashing by distributing the vector memory references into 
a plurality of detail loops, wherein the vector memory references 
that have circular dependencies therebetween are included in a 
common detail loop, and wherein the detail loops are ordered 
according to the unidirectional dependencies between the memory 
references. 

35. (Previously presented) A method for reducing the likelihood of cache 
thrashing by software to be executed on a computer system having a cache, 
comprising: 

executing the software on the computer system; 
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generating a profile indicating the manner in which the memory references 

of the software use the cache; 
identifying a portion of the memory references based on the profile, 

wherein the portion of the memory references is determined to 

cause cache thrashing; and 
reducing cache thrashing by distributing the portion of the memory 

references into distinct loops that allocate strips of the memory 

references into temporary arrays for execution, 
wherein the temporary arrays are configured to simultaneously fit in a 

single cache bank. 

36. (Previously presented) The computer of claim 24 wherein the temporary 
arrays are allocated consecutively such that no temporary array elements are 
cache synonyms. 

37. (Previously presented) The computer of claim 24 wherein the details loops 
are allocated into section loops that cause iterative execution of the detail loops 
based on a size of the strips. 

38. (Previously presented) The program storage medium of claim 25 wherein 
the temporary arrays are allocated consecutively such that no temporary array 
elements are cache synonyms. 

39. (Previously presented) The method of claim 35 wherein the temporary 
arrays are located and sized to reduce cache thrashing. 

40. (Previously presented) The method of claim 35 wherein the temporary 
arrays are allocated consecutively and iteratively executed by a size of the strips. 
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X. EVIDENCE APPENDIX 

None. 
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XI. RELATED PROCEEDINGS APPENDIX 

None. 
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